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Abstract 

Stochastic differential equations (SDEs) are established tools to model physical phenomena 
whose dynamics are affected by random noise. By estimating parameters of an SDE intrin- 
sic randomness of a system around its drift can be identified and separated from the drift 
itself. When it is of interest to model dynamics within a given population, i.e. to model 
simultaneously the performance of several experiments or subjects, mixed-effects mod- 
elling allows for the distinction of between and within experiment variability. A framework 
to model dynamics within a population using SDEs is proposed, representing simultane- 
ously several sources of variation: variability between experiments using a mixed-effects 
approach and stochasticity in the individual dynamics using SDEs. These stochastic differ- 
ential mixed- effects models have applications in e.g. pharmacokinetics/pharmacodynamics 
and biomedical modelling. A parameter estimation method is proposed and computational 
guidelines for an efficient implementation are given. Finally the method is evaluated using 
simulations from standard models like the two-dimensional Ornstein-Uhlenbeck (OU) and 
the square root models. 

Keywords: automatic differentiation, closed-form transition density expansion, 
maximum likelihood estimation, population estimation, stochastic differential equation, 
Cox-Ingersoll-Ross process 



1. INTRODUCTION 



Models defined through stochastic differential equations (SDEs) allow for the represen- 
tation of random variability in dynamical systems. This class of models is becoming more 
and more important (e.g. Allen (2007) and 0ksendal (2007)) and is a standard tool to 
model financial, neuronal and population growth dynamics. However, much has still to be 
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done, both from a theoretical and from a computational point of view, to make it applica- 
ble in those statistical fields that are already well established for deterministic dynamical 
models (ODE, DDE, PDE). For example, studies in which repeated measurements are 
taken on a series of individuals or experimental units play an important role in biomedical 
research: it is often reasonable to assume that responses follow the same model form for 
all experimental subjects, but parameters vary randomly among individuals. The impor- 



tance of these mixed- effects models (e.g. McCuUoch and Searle (2001), Pinheiro and Bates 



(2002)) lies in their ability to split the total variation into within- and between-individual 
components. 

The SDE approach has only recently been combined with mixed-effects models, because 
of the difficulties arising both from the theoretical and the computational side when dealing 
with SDEs. In Overgaard et al. (2005) and Torn0e et al. (2005) a stochastic differential 
mixed-effects model (SDMEM) with log-normally distributed random parameters and a 



constant diffusion term is estimated via (extended) Kalman filter. Donnet and Samson 



(2008) developed an estimation method based on a stochastic EM algorithm for fitting 



one-dimensional SDEs with mixed-effects. In Donnet et al. (2010) a Bayesian approach 



is applied to a one-dimensional model for growth curve data. The methods presented in 
the aforementioned references are computationally demanding when the dimension of the 
model and/or parameter space grows. However, they are all able to handle data contami- 
nated by measurement noise. In Ditlevsen and De Gaetano (2005) the likelihood function 
for a simple SDMEM with normally distributed random parameters is calculated explic- 
itly, but generally the likelihood function is unavailable. In Picchini et al. (2010) a com- 
putationally efficient method for estimating SDMEMs with random parameters following 
any sufficiently well-behaved continuous distribution was developed. First the conditional 
transition density of the diffusion process given the random effects is approximated in 
closed-form by a Hermite expansion, and then the obtained conditional likelihood is nu- 
merically integrated with respect to the random effects using Gaussian quadrature. The 
method resulted to be statistically accurate and computationally fast. However, in practice 
it was limited to one random effect only (see Picchini et al. (2008) for an application in 
neuroscience) since Gaussian quadrature is computationally inefficient when the number 
of random effects grows. 

Here the method presented in Picchini et al. (2010 ) is extended to handle SDMEMs with 
multiple random effects. Furthermore, the application is extended in a second direction to 
handle multidimensional SDMEMs. Computational guidelines for a practical implemen- 
tation are given, using e.g. automatic differentiation (AD) tools. Results obtained from 
simulation studies show that, at least for the examples discussed in the present work, the 
methodology is flexible enough to accommodate complex models having different distribu- 
tions for the random effects, not only the normal or log-normal distributions which are the 
ones usually employed. Satisfactory estimates for the unknown parameters are obtained 
even considering small populations of subjects (i.e. few repetitions of an experiment) and 
few observations per subject /experiment, which is often relevant, especially in biomedicine 
where large data sets are typically unavailable. 

A drawback of our approach is that measurement error is not accounted for, and thus 
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it is most useful in those situations where the variance of the measurement noise is small 
compared to the system noise. 

The paper is organized as follows. Section 2 introduces the SDMEMs, the observation 
scheme and the necessary notation. Section 3 introduces the likelihood function. Section 

4 considers approximate methods for when the expression of the exact likelihood function 
cannot be obtained; computational guidelines and software tools are also discussed. Section 

5 is devoted to the application on simulated datasets. Section 6 summarizes the results 
and the advantages and limitations of the method are discussed. An Appendix containing 
technical results closes the paper. 



2. STOCHASTIC DIFFERENTIAL MIXED EFFECTS MODELS 

In the following, bold characters are used to denote vectors and matrices. Consider 
a (i- dimensional (Ito) SDE model for some continuous process evolving in M different 
experimental units randomly chosen from a theoretical population: 

dXj = Ai(Xj,6',b^)c/t + o-(Xj,0,b^)rfWi, X^ = x^, i = l,...,M (1) 

where XJ is the value at time t > oi the iih unit, with Xq = X*^ ; G C 

W is a p-dimensional fixed effects parameter (the same for the entire population) and 
b* = b*(^) G -B C M"? is a g-dimensional random effects parameter (unit specific) with 
components (fe^^, 6*); each component h\ may follow a different continuous distribution 
(/ = 1, ...,g). Let ^^(b'l^') denote the joint distribution of the vector b*, parametrized by 
an r-dimensional parameter ^' G T C M^'. The W^'s are d-dimensional standard Brownian 
motions with components wl;^^'^ {h = l,...,d). The wj^^^^ and bj are assumed mutually 
independent for all 1 < i, j < M, l<h<d, l<l<q. The initial condition Xq is 
assumed equal to a vector of constants Xq G M'^. The drift and the diffusion coefficient 
functions fi{-) : E x Q x B ^ R'^ and cr(-) : E x & x B ^ S are assumed known up 
to the parameters, and are assumed sufficiently regular to ensure a unique weak solution 



(0ksendal (2007)), where E C M.^ denotes the state space of XJ and § denotes the set 
oi d X d positive definite matrices. Model ([T]) assumes that in each of the M units the 
evolution of X follows a common functional form, and therefore differences between units 
are due to different realizations of the Brownian motion paths {W^}^>j, and of the random 
parameters b*. Thus, the dynamics within a generic unit i are expressed via an SDE model 
driven by Brownian motion, and the introduction of a vector parameter randomly varying 
among units allows for the explanation of the variability between the M units. 

Assume that the distribution of XJ given (b*, 6) and X* = x^, s < t, has a strictly 
positive density w.r.t. the Lebesgue measure on E, which is denoted by 

x ^ ]9x(x,i(: - s|xs, b*,0) > 0, x G -E. (2) 

Assume moreover that unit i is observed at the same set of + 1 discrete time points 
{tp, t*^, . . . , t^,} for each coordinate xj:'^^^ of the process XJ (/i = 1, rf), whereas different 
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units may be observed at different sets of time points. Let x* be the {ui + 1) x d matrix 
of responses for unit i, witli jtli row given by x*(t*) = x*- = {x^p\ ...,x^j^^^), and let tlie 
following be the N x d total response matrix, = + !)• 





/ 


Xq 

Xni 


Xq 
. Jd)^ 

Xni 


\ 


x = (x^-,...,x-Y = 




x)' ■ 

(1)M 

Xq 


■ Xq 






V 


{1)M 
XriM 


XriM 


/ 



where T denotes transposition. Write A*- = t'j — f^^^ for the time distance between x*_]^ 
and X*. Notice that this observation scheme implies that the matrix of data x must not 
contain missing values. 

The goal is to estimate {0, ^) using simultaneously all the data in x. The specific 
values of the b*'s are not of interest, but only the identification of the vector-parameter ^ 
characterizing their distribution. However, estimates of the random parameters V are also 
obtained, since it is necessary to estimate them when estimating {6, ^). 

3. MAXIMUM LIKELIHOOD ESTIMATION 

To obtain the marginal density of x*, the conditional density of the data given the 
non-observable random effects b* is integrated with respect to the marginal density of the 
random effects, using that W^'^'^^ and 6/ are independent. This yields the likelihood function 

M M „ 

L{6,^) = l[p{^^\e,^) = 11 px{^^\h\e)pB{w\^)dh\ (3) 
i=i i=i -^^ 

Here p{-) is the density of x* given {6, Pb{ ) is the density of the random effects, and 
Px{') is the product of the transition densities Px{') given in ^ for a given realization of 
the random effects and for a given 0, 

Px{A^\e) = J]px(x},A}|x}_i,b\0). (4) 

i=i 

In applications the random effects are often assumed to be (multi)normally distributed, 
but Pb{-) could be any well-behaved density function. Solving the integral in ([S]) yields 
the marginal likelihood of the parameters for unit i, independent of the random effects 
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b*; by maximizing the resulting expression ([s]) with respect to 6 and ^ the corresponding 
maximum hkehhood estimators (MLE) 6 and ^ are obtained. Notice that it is possible to 
consider random effects having discrete distributions: in that case the integral becomes a 
sum and can be easily computed when the transition density px is known. In this paper 
only random effects having continuous distributions are considered. 

In simple cases an explicit expression for the likelihood function, and even explicit 
estimating equations for the MLEs can be found (Ditlevsen and De Gaetano (2005)). 
However, in general it is not possible to find an explicit solution for the integral, and thus 



exact MLEs are unavailable. This occurs when: (i) 



■ X 



is known but it is not 



possible to analytically solve the integral, and (ii) -l^]-!, ■) is unknown. In (i) the 

integral must be evaluated numerically to obtain an approximation of the likelihood (|3]). 



In (ii) first px{'x.p ■|x*_^, ■) is approximated, then the integral in (|3]) is solved numerically. 



In situation (ii) there exist several strategies to approximate the density (x* , -Ix* 



-1? 



e.g. Monte Carlo approximations, direct solution of the Fokker-Planck equation, or Hermite 
expansions, just to mention some of the possible approaches, see Hurn et al. (2007) for a 
comprehensive review focused on one- dimensional diffusions. We propose to approximate 
the transition density in closed- form using a Hermite expansion (Ai't-Sahalia (2008)). It 
often provides a good approximation to px, and Jensen and Poulsen (2002 ) showed that the 
method is computationally efficient. Using the obtained expression, the likelihood function 
is approximated, thus deriving approximated MLEs of and ^. 



4. CLOSED-FORM TRANSITION DENSITY EXPANSION AND LIKELI- 
HOOD APPROXIMATION 

4.I. Transition Density Expansion for Multidimensional SDEs 

Here the transition density expansion of a rf-dimensional time-homogeneous SDE as 



suggested in Ai't-Sahalia (2008) is reviewed; the same reference provides a method to 



handle mult i- dimensional time-inhomogeneous SDEs, but for ease of exposition attention 
is focused on the former case. Also references on further extensions, e.g. Levy processes, 
are given in the paper. We will only consider SDEs reducible to unit diffusion, i.e. multi- 
dimensional diffusions X for which there exists a one-to-one transformation to another 
diffusion with diffusion matrix the identity matrix. It is possible to perform the density 



expansion also for non-reducible SDEs (Ait-Sahalia (2008)). For the moment reference to 
is dropped when not necessary, i.e. a function f{x,0) is written f{x). 
Consider the following d-dimensional, reducible, time-homogeneous SDE 



dXt = ti{Xt)dt + (T{Xt)dWt, Xo = xo 



(5) 



and a series of d-dimensional discrete observations Xo,Xi,...,x„ from X, all observed at 
the same time points {to, ti, tn}; denote with E the state space of X. We want to 
approximate px(xj, Aj|xj_i), the conditional density of Xj given Xj_i = Xj_i, where 
Aj = tj — tj-i (j = 1, ...,n). Under regularity conditions (e.g. ^i(x) and cr(x) are assumed 



to be infinitely different iable in x on E, v(x) 



cr x cr x 



is a. dxd positive definite matrix 
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for all X in the interior of E and all the drift and diffusion components satisfy linear growth 
conditions, see Ait-Sahalia (2008) for details), the logarithm of the transition density can 
be expanded in closed form using an order J = +00 Hermite series, and approximated by 
a Taylor expansion up to order K, 

ln#)(x,,A,|x,_0 = -^ln(2vrA,) - 1 ln(det(v(x,))) + 4-^^(7(x^)l7(x,-i)) 

+ E4'^(7(x,)|7(x,-i))^. (6) 



fc=0 



Here denotes Aj raised to the power of k. The coefficients Cy"^ are given in the Appendix 
and 7(x) = (7'^^)(x), 7*^'^)(x))^ is the Lamperti transform, which by definition exists 
when the diffusion is reducible, and is such that V7(x) = cr~^(x). See the Appendix for a 
sufficient and necessary condition for reducibility. Using Ito's lemma, the transformation 
Yf = 7(Xt) defines a new diffusion process Y^, solution of the following SDE 

dYt = fiy{Yt)dt + dWt, Yo = yo, 

where the h-th element of Hy is given by {h = 1, ...,d) 

f^Pm = E({'^"'(^"'(Y,))}^y')(7-^(Y,))) 

- I E{^"'(7-'(Y*))^(7-^(Y,))^-^(7-^(Y,))| a.,,h-\Y,))a,,ij-\Y,)). 

i,j,k J hi 

For ease of interpretation the Lamperti transform and the drift term /iy for a scalar {d = 1) 
SDE are reported. Namely 7(-) is defined by 

Z""^* dv 



a{u)' 



where the lower bound of integration is an arbitrary point in the interior of E. The drift 
term is given by 

The transformation of into Y^ is a necessary step to make the transition density 
of the transformed process closer to a normal distribution, so that the Hermite expansion 
gives reasonable results. However, the reader is warned that this is by no means an easy 
task for many multivariate SDEs, and impossible for those having non-reducible diffusion 
(see Ait-Sahalia (2008 ) for details). The use of a software with symbolic algebra capabilities 
like e.g. Mathematica, Maple or Maxima is necessary to carry out the calculations. 
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4-2. Likelihood Approximation and Parameter Estimation 

(k) 

For a reducible time-homogeneous SDMEM, the coefficients Cy are obtained as in 
Section 4.1 by taking (0,b*), A* and (x*,x*_;^) instead of 6, Aj and (xj,Xj_i), respec- 
tively. Then is substituted for the unknown transition density in Q, thus obtaining 
a sequence of approximations to the likelihood function 



M „ 

L(^)(^,*) = n / pf\^'\^\d)PB{hWdW 
,-1 Jb 



(7) 



where 



n 



p(f)(x},A}|x}_i,b^ 



0) 



(8) 



and p^r is given by equation ([g]). By maximizing ([T]) with respect to {0, approximated 

MLEs 6^ and ^ are obtained. 

In general, the integral in ([T]) does not have a closed form solution, and therefore 
efficient numerical integration methods are needed; see Picchini et al. (2010) for the case of 
a single random effect (g = 1). General purpose approximation methods for one- or multi- 
dimensional integrals, irrespective of the random effects distribution, are available (e.g. 
Krommer and Ueberhuber ( 1998[ )) and implemented in several software packages, though 
the complexity of the problem grows fast when increasing the dimension of B. However, 
since exact transition densities or a closed-form approximation to px axe supposed to be 
available, analytic expressions for the integrands in ([s]) or ([T]) are known and the Laplace 
method (e.g. Shun and McCuUagh (1995)) may be used. Write b* = {b\, and define 



f{h'\e, *) = logpx(x*|b\ 6) + logpB(b*|*) 



(9) 



where px(x*|b*,0) is given in (|4]). Then log J^e^^^'^^'^^^dh^ can be approximated using a 
second order Taylor series expansion, known as Laplace approximation: 

log / e^(^'l^'*)rfb^ ~ f{h'\e, *) + - log(27r) - - log -U(h'\0, * 
Jb 2 2 

where b* = argmaxbi fih'\e, and U{h'\e, *) = d^f{W\e, *)/(9b*(9b^ is the Hessian 



of / w.r.t. b*. Thus, the log-likelihood function is approximately given by 

M 



logL(0,*)~^ 



1=1 



f{9\e, *) + I log(27r) - ^ log -H(b^|0, 



(10) 



and the values of 6 and ^ maximizing (10) are approximated MLEs. For the special 



case where —f{h^\6,^) is quadratic and convex in b* the Laplace approximation is ex- 
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act (Joe (2008)) and (10) provides the exact hkehhood function. An approximation of 
can be derived in the same way, and we denote with 



0ee,*eT 



the corresponding approximated MLE of (6, Davidian and Giltinan (2003) recommend 
to use the Laplace approximation only if Ui is "large"; however Ko and Davidian (2000) 
note that even if the n^'s are small, the inferences should still be valid if the magnitude of 
intra-individual variation is small relative to the inter-individual variation. This happens 
in many applications, e.g. in pharmacokinetics. 



In general, computing (10) or log L^^\6, ^) is non-trivial, since M independent opti- 
mization procedures must be run to obtain the b*'s and then the M Hessians H(b*|0, ^) 
must be computed. The latter problem can be solved using either (i) approximations based 
on finite differences, (ii) computing the analytic expressions of the Hessians using a symbolic 



calculus program or (iii) using automatic differentiation tools (AD, e.g. Griewank (2000)). 
We recommend to avoid method (i) since it is computationally costly when the dimen- 
sion of M and/or b* grows, whereas methods (ii)-(iii) are reliable choices, since symbolic 
packages are becoming standard in most software and are anyway necessary to calculate 

(K) 

an approximation to the transition density. However, when a symbolic package is not 
available to the user or is not of help in some specific situations, AD can be a convenient 
(if not the only possible) choice, especially when the function to be differentiated is defined 
via a complex software code; see the Conclusion for a discussion. 

In order to derive the required Hessian automatically we used the AD tool ADiMat for 



Matlab (Bischof et al. (2005)), see http://www.autodiff.org for a comprehensive list 
of other AD software. For example, assume that a user defined Matlab function named 
loglik_indiv computes the / function in ([o]) at a given value of the random effects b* 
(named b_rand): 

result = loglik_indiv(b_rand) 

so result contains the value of / at b*. The following MATLAB code then invokes ADiMat 
and creates automatically a file named g_loglik_indiv containing the code necessary to 
return the exact (to machine precision) Hessian of loglik_indiv w.r.t. b_rand: 

addif f ((Sloglik_indiv, 'b_rand',[], ' --2ndorderf wd' ) 

At this point we initialize the array and the matrix that will contain the gradient and the 
Hessian of / w.r.t. the g-dimensional vector b*: 

gradient = createFullGradients(b_rand) ; y„ inizialize the gradient 
Hessian = createHessians( [q q] , b_rand) ; 7„ inizialize the Hessian 
[Hessian, gradient] = g_loglik_indiv (Hessian, gradient, b_rand) ; 

The last line returns the desired Hessian and the gradient of / evaluated at b_rand. We 
used the Hessian either to compute the Laplace approximation or in the trust region 
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method used for the internal step optimization, read below. When it is possible to derive 
the expression for the Hessian analytically we strongly recommend to avoid the use of AD 



tools in order to speed up the estimation algorithm. For example, in Section 5.1 only two 
random effects are considered and using the Matlab Symbolic Calculus Toolbox we have 
obtained the analytic expression for the Hessian without much effort. 

For the remainder of this Section the reference to the index K is dropped, except where 
necessary, as the following apply irrespectively of whether px oi' P'x'^ is used. The mini- 
mization of — logL(0, ^) is a nested optimization problem. First the internal optimization 
step estimates the b*'s for every unit (the b*'s are sometimes known in the literature as em- 
pirical Bayes estimators). Since both symbolic calculus and AD tools provide exact results 
for the derivatives of /(b*), the values provided via AD being only limited by the computer 
precision, the exact gradient and Hessian of /(b*) can be used to minimize — /(b*) w.r.t. 
b*. We used the subspace trust-region method described in Coleman and Li (1996) and 



implemented in the Matlab f minunc function. The external optimization step minimizes 
— \ogL{6,'^) w.r.t. (0,^'), after plugging the b*'s into (10). This is a computationally 
heavy task, especially for large M, because the M internal optimization steps must be 
performed for each infinitesimal variation of the parameters {d, ^'). Therefore to perform 
the external step we reverted to derivative-free optimization methods, namely the Nelder- 



Mead simplex with additional checks on parameter bounds, as implemented by D'Errico 



(2006) for Matlab. To speed up the algorithm convergence, h^{k) may be used as starting 
value for b* in the {k + l)th iteration of the internal step, where W{k) is the estimate of b* 
at the end of the fcth iteration of the internal step. This might not be an optimal strategy, 
however it should improve over the choice made by some authors who use a vector of zeros 
as starting value for b' each time the internal step is performed. The latter strategy may 
be inefficient when dealing with highly time consuming problems, as it requires many more 
iterations. 

Once estimates for 6 and ^ are available, estimates of the random parameters b* are 
automatically given by the values of the b* at the last iteration of the external optimization 



step, see Section 5.2 for an example 



5. SIMULATION STUDIES 

In this Section the efficacy of the method is assessed through Monte Carlo simulations 
under different experimental designs. We always choose M and n to be not very large, since 
in most applications, e.g. in the biomedical context, large datasets are often unavailable. 
However, see Picchini et al. (2008) for the application of a one-dimensional SDMEM on a 



very large data set. 

5.1. Orange Trees Growth Model 

The following is as a toy example for growth models, where SDEs are used regularly, 
especially to describe animal growth that allows for non-monotone growth and can model 



unexpected changes in growth rates, see Donnet et al. (2010) for an application to chicken 
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growth, Strathe et al. (2009) for an application to growth of pigs, and Fihpe et al. (2010) 



for an apphcation to bovine data. 



In Lindstrom and Bates (1990) and Pinheiro and Bates (2002 Sections 8.1.1-8.2.1 



data from a study on the growth of orange trees are studied by means of deterministic 



nonhnear mixed-effects models using the method proposed in Lindstrom and Bates (1990). 



The data are available in the Orange dataset provided in the nlme R package (Pinheiro et al. 



(2007)). This is a balanced design consisting of seven measurements of the circumference 
of each of five orange trees. In these references, a logistic model was considered to study 
the relationship between the circumference X^'^ (mm), measured on the ith tree at age tij 
(days), and the age (i = 1, 5 and j = 1, 7): 



X 



l+exp{-{tij - 02)/03) 



(11) 



with 01 (mm), 02 (days) and 03 (days) all positive, and Sij ~ A/'(0, cr^) are i.i.d. mea- 
surement error terms. The parameter 0i represents the asymptotic value of X as time 
goes to infinity, 02 is the time value at which X = 0i/2 (the inflection point of the logis- 
tic model) and 03 is the time distance between the inflection point and the point where 
X = 0i/(l + e-i). In|Picchini et al.| (|2010D a SDMEM was derived from model ([Tl| with 



a normally distributed random effect on 0i. The likelihood approximation described in 



Section |4.2| was applied to estimate parameters, but using Gaussian quadrature instead of 
the Laplace method to solve the one-dimensional integral. Now consider a SDMEM with 
random effects on both 0i and 03. The dynamical model corresponding to (11) for the ith 
tree and ignoring the error term is given by the following ODE 



dxi 

dt 



(0i + 0i)(03 + 0i: 



-x;(0i + 0i-x,O, 



0' 



t > t: 







with (f)\ ~ A/'(0, (j^J independent of 03 ~ A/'(0, cr^^) and both independent of ~ A/'(0, a^) 
for all i and j. Now 02 only appears in the deterministic initial condition Xq = X^i = 

0i/(l + exp[(02 — tQ)/03]), where tg = 118 days for all the trees. In growth data it is 
often observed that the variance is proportional to the level, which is obtained in an SDE 
if the diffusion coefficient is proportional to the square root of the process itself. Consider 
a state-dependent diffusion coefficient leading to the SDMEM: 



dXl 



1 



(01+01)(03 + 0*3 



-x: 



+ 



Xi)dt + a^/XidW^ 



XI = xl, (12) 



A/'(0,O, 



(13) 



where a has units (mm/days) ""^Z^. Thus, = (0i,03,(t), b* = (0^,03) and ^ = (0",^^ , dt^g ) . 
Since the random effects are independent, the density pb in ^ is ^^(b*!^') = (y9(01)y9(03), 
where (p{4>{) and v^(03) are normal pdfs with means zero and standard deviations cr^j and 
(T<^3, respectively. 
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Table 1: Orange trees growth: Monte Carlo maximum likelihood estimates and 95% confidence intervals 



from 1000 simulations of model (12)-(13l, using an order K ~ 2 for the closed form density expansion 
(CFE) and the density approximation based on the Euler-Maruyama discretization (EuM). For the CFE 
method measures of symmetry are also reported. 



True parameter values 




















(7 








ii 


03 


a 








M^5, )i + l = 7 


195 


350 


0.08 


25 


52.5 


Mean CFE |95% CI| 


197.40 ]164.98, 236.97] 


356.88 ]281.74, 460.95] 


0.079 ]0.067, 0.102] 


15.68 ]1.7 X Uy 44.33] 


28.67 ]5.8 X ur" 


112.28] 












Skewness CFE 


0.35 


0.59 


0.22 


0.55 


1.11 














Kurtosis CFE 


3.30 


3.56 


3.06 


2.76 


3.88 














Mem EuM |95% CI| 


183.35 ]154.62, 217.93] 


303.75 ]236.57, 398.10] 


0.089 ]0.060, 0.123] 


12.60 ]1.5 X ia-~, 39.56] 


34.96 ]5.9 X 10-« 


112.48] 


M = 5, 11 + 1 = 20 


195 


350 


0.08 


25 


52.5 


ilcan CFE |95% CI| 


196.71 ]164.48, 236.39] 


352.16 ]274.88, 461.84] 


0.079 ]0.067, 0.090] 


15.73 ]2 X 10-', 43.67] 


30.77 ]7 X 10-*, 


114.94] 












Skewness CFE 


0.33 


0.63 


-0.04 


0.44 


1.03 














Kurtosis CFE 


3.33 


3.67 


2.93 


2.38 


3.69 














Mean EuM |95% CI| 


192.50 ]161.45, 230.09] 


339.12 ]264.82, 445.84] 


0.080 ]0.068, 0.091] 


15.01 ]1.9 X 10-', 41.48] 


32.63 ]6.4 X 10-" 


114.79] 


M = 30, n + 1 = 7 


195 


350 


0.08 


25 


52.5 


ilcan CFE |95% CI| 


196.06 ]183.41. 209.52] 


354.55 ]317.66, 395.48] 


0.081 ]0.072, 0.092] 


22.71 ]7.25, 33.4.5] 


42.18 ]1.5 X 10- 


, 73.84] 












Skewness CFE 


0.20 


0.32 


0.14 


-0.89 


-0.65 














Kurtosis CFE 


3.15 


3.29 


3.14 


5.33 


3.20 














Mean EuM |95% CI| 


182.89 ]172.02, 194.68] 


303.87 ]273.18, 341.84] 


0.093 ]0.080, 0.106] 


19.23 ]0.05, 27.82] 


48.54 ]12.13, ■ 


5.81] 


M = 30, n + 1 = 20 


195 


350 


0.08 


25 


52.5 


ilcan CFE |95% CI] 


195.62 ]183.33, 209.20] 


351.18 ]315.47, 389.21] 


0.080 ]0.076, 0.085] 


23.04 ]9.61, 34.03] 


44.83 ]2.2 X 10- 


, 74.05] 












Skewness CFE 


0.20 


0.30 


0.05 


-0.73 


-0.71 














Kurtosis CFE 


3.10 


3.27 


2.76 


5.08 


3.62 














Mean EuM |95% CI] 


191.51 ]179.93. 204.40] 


338.19 ]304.38, 374.94] 


0.081 ]0.076, 0.086] 


22.24 ]8.93, 32.83] 


46.34 ]4.4 X 10- 


, 75.03] 



We generated 1000 datasets of dimension {n + 1) x M from (12)-(13) and estimated 
{6, ^) on each dataset, thus obtaining 1000 sets of parameter estimates. This was repeated 
for (M, n + 1) = (5,7), (5,20), (30,7) and (30,20). Trajectories were generated using the 
Milstein scheme (Kloeden and Platen (1992)) with unit step size in the same time interval 
[118, 1582] as in the real data. The data were then extracted by linear interpolation from 
the simulated trajectories at the linearly equally spaced sampling times {to, ^i, ^n} for 
different values of n, where to = 118 and t„ = 1582 for every n. An exception is the case 
M = 7, = = 5, where {to, t„} = {118, 484, 664, 1004, 1231, 1372, 1582}, the same as 
in the data. 

Parameters were fixed at (Xq, 0i, 03, a, a^.^, a^g) = (30,195,350,0.08,25,52.5). The 
value for is chosen such that the coefficient of variation for (03 + 03) is 15%, i.e. 03 
has non-negligible influence. An order K = 2 approximation to the likelihood was used, 

-(2) - (2) 

see the Appendix for the coefficients. The estimates {6 , # ) have been obtained as 
described in Section 4^ and are denoted as CFE in Table [T| where CFE stands for Closed 
Form Expansion to denote that a closed-form transition density expansion technique has 
been used. 

The CFE estimates were used to produce the fit for (M, n+ 1) = (5, 20) given in Figure 



1 (a) , reporting simulated data and empirical mean of 5000 simulated trajectories from ( 12 )- 



(13) generated with the Milstein scheme using a step-size of unit length. Empirical 95% 
confidence bands of trajectory values and three example trajectories are also reported. For 
each simulated trajectory independent realizations of 

,(2) 
k 

(M, ra) = (30,20) is given in Figure 1(b) 
p < 0.001) between the estimates of 0i anu (p^, 
squares fit line. Similar relations were found when using different combinations of M and 
n. Histograms of the population parameter estimates (f)^^\ 03^ ■* and a^'^^ are given in Figure 



and 03 were produced by drawing 
from the normal distributions A/'(0, (o"^^'')^) and A/'(0, (5"^3^)^)- The corresponding fit for 



There is a positive linear correlation (r = 0.42, 
and 03, see Figure [2| reporting also the least 
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(a) + = (5,20) 



(b) (Af,n + 1) = (30,20) 



Figure 1: Orange trees growth: simulated data (circles connected by straight lines) and fit of the SDMEM 



(12l-(13) using an order K = 2 for the density expansion. In panel (a) is {M,n + 1) = (5,20) and in 
panel (b) is (M, n + 1) — (30, 20). Each panel reports the empirical mean curve (smooth solid line), 95% 
empirical confidence curves (dashed lines) and example simulated trajectories. 
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Figure 2: Orange trees growth: scatterplot of (f>^ vs (f>\ and least squares fit for (M, n + 1) = (30, 20). 
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(a) 0f ' (b) 0f (c) 

Figure 3: Orange trees growth: histogram of population parameter estimates obtained using an order 
K = 2 for the density expansion, and fits of normal probability density functions for (Af , n + 1) = (30, 20). 



[3} with normal probability density functions fitted on top. The normal densities fit well 
to the histograms with estimated means (standard deviations) equal to 195.6 (6.6), 351.2 
(19.2) and 0.080 (0.003) for Ipf^ and a^^), respectively. 

It is worthwhile to compare the methodology presented here, where a closed form expan- 
sion to the transition density is used, with a more straightforward approach, namely the ap- 
proximation to the transition density based on the so-called "one-step" Euler-Maruyama ap- 
proximation. Consider for ease of exposition a scalar SDE dXt = fi{Xt)dt+a{Xt)dWt start- 
ing at Xq = xq. The SDE can be approximated by Xt+A — Xt = fj.{Xt)A + a(Xt)A^^'^et+A 
for A "small", with {st+A} a sequence of independent draws from the standard normal 
distribution. This leads to the following transition density approximation 



Px{xt+A, A\xt) ^ '^{xt+A] Xt + fi{xt)A, a^{xt)A) 



(14) 



where (p{-] m, v) is the pdf of the normal distribution with mean m and variance v. The 



parameter estimates obtained using (14) instead of the closed-form approximation p^"* are 



given in Table [T] and are denoted EuM (standing for Euler-Maruyama). The value for A 
used in (14) is the time distance between the simulated data points. The comparisons 



between CFE and EuM for the same values of M and n have been performed using the 
same simulated datasets. The quality of the estimates obtained with the CFE method 
compared to the simple EuM approximation is considerable improved for data sampled at 
low frequency (A large), i.e. when n = 7, which is a common situation in applications. For 
(M = 30, n = 7) the 95% confidence intervals for the EuM method even fail to contain the 
true parameter values. The bad behavior of the EuM approximation when A is not small 
is well documented (e.g. Jensen and Poulsen (2002), S0rensen ( 2004[ )) and therefore our 
results are not surprising. Several experiments with different SDE models (not SDMEMs) 



have been conducted in Jensen and Poulsen (2002) where the conclusion is that although 



the CFE technique does require tedious algebraic calculations, they seem to be worth the 
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effort. 



5.2. Two- Dimensional Ornstein-Uhlenbeck Process 

The OU process has found numerous apphcations in biology, physics, engineering and 



finance, see e.g. Picchini et aL (2008) and Ditlevsen and Lansky (2005) for apphcations in 
neuroscience, or Favetto and Samson (2010) for a two-dimensional OU model describing 



the tissue microvascularization in anti-cancer therapy. 

Consider the following SDMEM of a two-dimensional OU process: 



dX, 



(i)i 



(2)i 



(2)i 



dt + aidWt^^^\ 



- ( /32i&^i(xi'^' - ai) + /322&22(^t''' - "2) ]dt + a2dW, 



(2)i 



r{2)i 
t 5 



b^, ~ r(z/„„z/„/), /,/' = l,2; t = l,...,M 

~ -^0 ' 

with positive parameters r and s and probability density function 

1 



(15) 

(16) 
(17) 



with initial values X^'"'' = Xq ,k = 1,2. Here r(r, s) denotes the Gamma distribution 



Pr[z) 



sT(r) 



z>0, 



with mean 1 when s = r~^. The parameters b]i,, Pw, ai and are strictly positive 
(/, /' = 1, 2) whereas ai and a2 are real. Let * denote element-wise multiplication. Rewrite 
the system in matrix notation as 



where 



dXi 



a2 



(3 * h\(x - X.i)dt + crdWi 



^0' 



1,...,M 



(18) 




/3 



ai 
a2 



Al /3l2 



-'21 



'\2 



^22 



(i)i 

{2)i 



X, 



(l)i 

D 

{2)i 



The matrices /3 * b* and cr are assumed to have full rank. Assume the random effects are 



mutually independent and independent of Xq and W^. Because of (17) the random effects 
have mean one and therefore E(/3 * b*) = /3 is the population mean. The set of parameters 
to be estimated in the external optimization step is = (ai, a2; /^ii, /3i2; /32i) /322) C2) 
and ^ = (z/11, z/12, 1^21, ^^22)- However, during the internal optimization step it is necessary 
to estimate the b"s also, that is 4M parameters. Thus, the total number of parameters in 
the overall estimation algorithm with internal and external steps is 12 -|- AM. 



A stationary solution to (18) exists when the real parts of the eigenvalues of /3 * b* are 



strictly positive, i.e. /3 * b* has to be positive definite. The OU process is one of the only 
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multivariate models with a known transition density other than multivariate models which 



reduce to the superposition of univariate processes. The transition density of model (18) 
for a given realization of the random effects is the bivariate Normal 



Px(x},A5|x}_i,b\0) = (27r)-i|f2|-i/2exp( 



m)/2) 



(19) 



with mean vector m 



a) exp{—{f3 * b*)AM and covariance matrix f2 



A - exp(-(/3 * b*)A*)Aexp(-(/3 * b*)^A;.), where 



2tr(/3*b^)|/3*b^ 



(1/3 * b>cr^ + (/3 * b* - tr(/3 * h')I)(T(T^{/3 *h' - tr(/3 * b*)I)^) 



is the 2x2 matrix solution of the Lyapunov equation (/3 * b*)A + A(/3 * b*)^ 
I is the 2x2 identity matrix (Gardiner (1985)). Here |A 



(Tcr and 

denotes the determinant and 



tr(A) denotes the trace of a square matrix A. 

>From (18) we generated 1000 datasets of dimension 2{n + 1) x M and estimated 
the parameters using the proposed approximated method, thus obtaining 1000 sets of 
parameter estimates. A dataset consists of 2(r2 + 1) observations at the equally spaced 
sampling times {0 = < < ' ' ' < '^n — ^} f*^^ each of the M experiments. The 



"0 ^ "-i 

observations are obtained by linear interpolation from simulated trajectories using the 

-3 dKloeden and Platen] (p!992|)). We used 



Euler-Maruyama scheme with step size equal to 10" 



the following set-up: (X^^^\ X^^^*) = (3, 3), (ai, ^2, Ai, /3i2, /32i, /322, c^i, <72) = (1, 1.5, 3, 2.5, 
1.8, 2, 0.3, 0.5,), and (z/n, z/12, 1^21, '^22) = (45, 100, 100, 25). An order K = 2 approximation 
to the likelihood was used, see the Appendix for the coefficients of the transition density 



(2) ^(2) 



are given in Table [2 



expansion. The estimates {0 
The fit for (M,2(n + 1)) 
respectively. Each figure reports the simulated data, the empirical mean of 5000 simulated 



(20,40) is given in Figures |4(a)J|4(b)J for X^^^' and 



'{2)i 



trajectories from (15)-(17), generated with the Euler-Maruyama scheme using a step size 
of length 10"'^, the empirical 95% confidence bands of trajectory values as well as five 
example trajectories. For each simulated trajectory a realization of was produced by 



drawing from the r(z/^j' 



(2) ^,{2)x_i, 



W 



distribution using the estimates given in Table 



empirical correlations of the population parameter estimates is reported in Figure [5 There 
is a strong negative correlation between the estimates of ai and 02 (r = —0.97, p < 0.001), 



The 



which are the asymptotic means for and X'/'\ The sum &i + results always 

almost exactly equal to 2.5 in each dataset (mean = 2.50, standard deviation = 0.04), so 
the sum is more precisely determined than each mean parameter. This occurs because there 
is a strong negative correlation between X^^^^ and xj:"^^^ equal to -0.898 in the stationary 
distribution in this numerical example. The individual mean parameters are unbiased but 
with standard deviations five times larger than the sum. There is a moderate negative 
correlation between /32i and (322 = —0.53, p < 0.001). 

The estimation method provides estimates for the b"s also, given by the last values 
returned by the internal optimization step in the last round of the overall algorithm. An 



'{2)i 



;,(2) 
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Table 2: Ornstein-Uhlenbeck model: Monte Carlo maximum likelihood estimates and 95% confidence 
intervals from 1000 simulations of model (181, using an order K = 2 density expansion. 



Ti'ue parameter values 






Estimates for M = 


7, 2(n + 1) = 40 




oil 02 


Ai 


Pl2 




.(2) 


d?' 


Pll 




1 1.5 


3 


2.5 


Mean [95% CI| 


l.m [0.59, 1.40] 


1.50 ]1.00, 1.97] 


3.03 ]2.50, 3.59] 


2.50 [2.49, 2.51[ 








Skewness 


0.19 


-0.32 


0.21 


3.17 








Kurtosis 


5.27 


5.73 


3.29 


59.60 


Ai iin 




""2 




''21 


,5(2) 




4^' 


1.8 2 


0.3 


0.5 


Mean [95% CI[ 


1.61 [0.80, 2.14] 


2.30 ]1.56, 3.63] 


0.307 ]0.274, 0.339] 


0.500 [0.494, 0..508[ 








Skewness 


-1.13 


1.17 
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^22 


45 100 
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25 


Mean [95% CI[ 
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120.97 ]5.90, 171.62] 
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98.60 [4.92, 171.62[ 
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-0.17 
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True parameter 
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Estimates for M = 
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Mean [95% CI] 
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0.500 [0.495, 0.503] 








Skewness 
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Kurtosis 
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3.00 


41.16 


I'll vu 




i'22 




-(2) 




-(2) 
"21 




45 100 
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25 


Mean [95% CI[ 


83.35 [22.15, 171.62] 


114.16 ]18.18, 171.62] 


105.00 ]6.04, 171.62] 


84.61 [7.36, 171.62[ 








Skewness 
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-0.26 


0.24 
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(a) XI 



(b) X 



(2)1 



Figure 4: Ornstein-Uhlenbeck: simulated data (circles connected by straight lines), fit of X^^-** (panel (a)) 
and of (panel (b)) from the SDMEM ^ for (M,2(n + 1)) = (7,40). For each coordinate of the 

system the panels report the empirical mean curve of the SDMEM (smooth solid line), 95% empirical 
confidence curves (dashed lines) and five simulated trajectories. 
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Figure 5: Ornstein-Uhlenbeck: scatterplot matrix of the estimates {ce'"i \ cS^^ , fi^^L i ^'vi ' /^2i'' j /^22'' ' '''i^'' ; '^2^'' ) 
for (M,2(n+ 1)) = (20,40). 



-^(2) (2) 

equivalent strategy is to plug {6 , ^' ) into ([9| and then minimize — /(b*) w.r.t. b* 

and obtain b . The estimation of the random effects is fast because we make use of the 
explicit Hessian, and for this example only 2-3 iterations of the internal step algorithm were 
necessary. We estimated the b*'s by plugging each of the 1000 sets of estimates into (|9]), 
thus obtaining the corresponding 1000 sets of estimates of b*. In Figure [g] boxplots of the 
estimates of the four random effects are reported for M = 7, where estimates from different 
units have been pooled together. For both M = 7 and M = 20 the estimates of the random 
effects have sample means equal to one, as it should be given the distributional hypothesis. 
The standard deviations of the true random effects are given by 1/y^ and thus equal 0.15, 
0.1, 0.1 and 0.2 for h\^^ b\2, &21 ^iid 6225 respectively. The empirical standard deviations of 
the estimated random effects for M = 7 are 0.09, 0.09, 0.12 and 0.11, whereas for M = 20 
they are 0.09, 0.06, 0.08 and 0.09. 

The parameters could be estimated by plugging the exact transition density (19) into 
^ to form ^ and then maximize (10). However, the effort required for the estimation 
algorithm to converge is computationally costly, both using the analytic expression for the 
Hessian of / in (10) or the one obtained using AD, since the Hessian has a huge expression 
when using the exact transition density. This problem is not present when using the density 
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Figure 6: Ornstein-Uhlenbeck: boxplots of the random effects estimates b 
(M,2(n + 1)) = (7,40). 



for the SDMEM (18 1 for 



expansion because the expansion consists of polynomials of the parameters. 

5.3. The square root SDMEM 

The square root process is given by 

dXt = -l3{Xt - a)dt + a^/XtdWt. 

This process is ergodic and its stationary distribution is the Gamma distribution with 
shape parameter l^aja^ and scale parameter (T^/(2/?) provided that /3 > 0, a > 0, a > 0, 
and 2/3a > o"^. The process has many applications: it is, for instance, used in mathematical 



finance to model short term interest rates where it is called the CIR process, see Cox et al. 



(1985). It is also a particular example of an integrate-and-fire model used to describe the 



evolution of the membrane potential in a neuron between emission of electrical impulses. 



see e.g. Ditlevsen and Lansky (2006) and references therein. In the neuronal literature it 



is called the Feller process, because William Feller proposed it as a model for population 
growth in 1951. Consider the SDMEM 



a 



a'')dt 



1,...,M. 



(20) 
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Assume a* ~ B{pa,Pa), 5"* ~ CAf{pai,plJ ^.nd /3* ~ CNipjB^.p^p^). Here CJ\f{-,-) denotes 
the (standard or 2-parameter) log-normal distribution and B{pa,Pa) denotes the (general- 
ized symmetric) Beta distribution on the interval [a,b], with density function 

1 (z-a)P^-\b-zy"-^ 

where B{-,-) is the beta function and a and b are known constants. For ease of inter- 
pretation, assume the individual parameters /3* and a* to have unknown means /3 and a 
respectively, e.g. assume /3* = /3 + /3* and 5"* = a + a*, /3* and a* being zero mean random 
quantities. This implies that (3 and a do not need to be estimated directly: in fact the 
estimate for P results determined via the moment relation /3 = exp(p^j + p'^p^/2) and can 
be calculated once estimates for p/j^ and p^^ are available. Similarly, an estimate for a can 
be determined via cr = exp(po-i + p'^^/2) by plugging in the estimates for po-i and 

The parameters to be estimated are 9 = a, = {Pa,Pi3i,Pi32^PcTi,Pa2) a^id b* = 
(a*, /3*, cr*). To ensure that XI stays positive it is required that 2{a + Q!*)/3Y(o"*)^ > 1. 
This condition must be checked in each iteration of the estimation algorithm. The means 
and variances of the population parameters added the random effects are 



E{a + a') = a + {a + b)/2 
W) = (r = expip^,+plj2) 
E0^) = P = expipfs,+plJ2) 



Var(a + a') = (6 - a)V(4(2p« + 1)), 
Var(a*) = (exp(p^J _ l)exp(2p^, +pIJ, 
VaT0') = (exp(pJJ - l)exp(2p^, 



For fixed values of the random effects, the asymptotic mean for the experimental unit i is 
a + a*. In most applications this value should be bounded within physical realistic values, 
and thus the Beta distribution was chosen for a*, since the support of the distribution of 
a + a* is then [a + a, a + 6]. As in the previous examples, 1000 simulations were performed 
by generating equidistant observations in the time interval [0, 1] with the following setup: 
{XQ,a,Pa,Pi3i,Pi32,Pai,Pa2) = (1,3,5,0,0.25,0.1,0.3) with fixed constants [a,b] = [0.1,5]. 
The coefficient of variations for a + a*, /3* and a* are then 13.3%, 25.4% and 30.7%, 
respectively. The estimates obtained using an order K = 2 density expansion are given 
in Table [s] A positive bias for a^'^^ is noticeable, however results are overall satisfactory, 
even using small sample sizes. Bias in estimates of drift parameters on finite observation 
intervals is a well known problem, and especially the speed parameter /3 in mean reverting 



diffusion models is known to be biased and highly variable. In Tang and Chen (2009) the 
biases for /3 in the OU and the square root model are calculated to be on the order of T, 
where T is the length of the observation interval, and thus increasing n does not improve 
the estimates unless the observation interval is also increased. 

As described in the Ornstein-Uhlenbeck example, we have verified that the small sample 
distributions for the estimates of a\ /3* and a* have the expected characteristics: e.g. in 
the case {M,n -|- 1) = (5,7), by pooling together estimates from different units we have 
the following means (standard deviations) 2.61 (1.40), 0.95 (0.38) and 1.06 (0.29) for the 
estimates of a*, /3* and a*, respectively. These values match well with the first moments 
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Table 3: The square root model: Monte Carlo maximum likelihood estimates and 95% confidence intervals 
from 1000 simulations of model ( |20| , using an order K = 2 density expansion. Determined parameters 
are denoted with (*), i.e. true values for /3 and a are determined according to the moment relations 
/3 = exp(p^j +p^^/2) and a = exp(po-i +pj,^l'^)- Estimates for determined parameters are calculated by 
plugging the estimates of pf^^ ^ and p^^ ^ obtained from each of the 1000 Monte Carlo simulations into the 
moment relations, then averaging over the 1000 determined values. 



True parameter values 






Estimates for A/ 


— 5, n -1 1 — 7 




c I3(*) 


" n 


Pa 






,3(2) (t) 


5.(2) (*) 


pi" 


3 1.03 


1.16 




Mean |95% CI] 


4.06 [1,52, 9.85] 


1.14 [0.51, 1.69[ 


1.13 [0.76, 1.60[ 


8.80 [0.94, 112.84[ 








Skewness 


1.23 


0.49 


0.60 


4.37 








Kurtosis 


4.46 


3.20 


4.72 


22.47 


PSl P/32 


P^i 


P^2 




.(2) 


pS 


P<1' 


-(2) 
P!^2 


0.25 


0.1 


0.3 


Mean |95% CI] 
Skewness 


-0.038 [-0.824, 0.177] 
-2.92 


0.372 [0.001, 1.000] 
0.79 


0.082 [-0.278, 0.450] 
-0.20 


0.173 [0.001, 0.,561] 
0.88 








Kurtosis 


16,(17 


2.10 


4.16 


4.00 


True parameter values 






Estimates for AI - 


= 10, n 4-1 = 20 




c /3{*) 


<^ (*) 


Pa 




a(2) 


/3<2) (*) 


»(2) (.) 


-(2) 

Pa 


3 1.03 


1.16 


5 


Mean |95% CI] 


4.43 [1.85, 9.63] 


1.21 [0.44, 1.69[ 


1.15 [0.88, 1.48[ 


5.31 [0.99, 64.63[ 








Skewness 


1.01 


-0.04 


0.44 


6.35 








Kurtosis 


3.65 


2.66 


3.44 


46.77 


P/Si P/32 




Pt^2 






P£' 


Pi^i' 


-(2) 
P„2 


0.25 


0.1 


0.3 


Mean |95% CI] 


-0.045 [-0.953. 0.154] 


0.487 [0.001. 1[ 


0.108 [-0.166, 0.376] 


0.153 [0.001. 0.447] 








Skewness 


-3.04 


0.16 


-0.03 


0.71 








Kurtosis 


14.65 


1.40 


3.29 


2.84 


True parameter values 






Estimates for AI - 


= 20, n 4-1 = 20 




c I3{*) 


<T {*) 


Pa 




a(2) 


$m (*) 


5-(2) (•) 


pL^' 


3 1.03 


1.16 




Mean [95% CI] 


4.00 [2.35, 6.78] 


1.21 [0.97, 1.68[ 


1.15 [0.98, 1.35[ 


2.33 [1.00, 5.00[ 








Skewness 


1.50 


0.27 


0.27 


12.78 








Kurtosis 


6.74 


2.98 


3.38 


174.05 


PSl P/32 


P^i 


PiJ2 




-(2) 




P<1' 


g(2) 

Po2 


0.25 


0.1 


0.3 


Mean [95% CI] 


-0.011 [-0.069, 0.042] 


0.498 [0.010, 1.000] 


0.101 1-0.061, 0.256] 


0.27 [0.13, 0.41[ 








Skewness 


-5.97 


0.22 


-0.02 


-0.01 








Kurtosis 


47.98 


1.59 


3.17 


3.17 



of the true random effects E{a') = (0.1 + 5)/2 = 2.55, E(/3*) = 1.03 and E{a') = 1.16 and 
less well with the standard deviations SD^i = 0.74, SD^,; = 0.26 and SD^-i = 0.35. Average 
estimation time on a dataset of dimension {M,n + 1) = (10,20) was around 95 seconds 
and around 160 seconds when (M, n + 1) = (20, 20), using a Matlab program on an Intel 
Core 2 Quad CPU (3 GHz). 



6. CONCLUSIONS 



An estimation method for population models defined via SDEs, incorporating random 
effects, has been proposed and evaluated through simulations. SDK models with ran- 
dom effects have rarely been studied, as it is still non-trivial to estimate parameters in 
SDEs, even on single/individual trajectories, due to difficulties in deriving analytically the 
transition densities and the computational cost required to approximate the densities nu- 
merically. Approximation methods for transition densities is an important research topic, 
since a good approximation is necessary to carry out inferences based on the likelihood 
function, which guarantees well known optimal properties for the resulting estimators. Of 



the several approximate methods proposed in the last decades (see e.g. S0rensen (2004) and 



S0rensen 


(2004 


) and 


ested by i 


\it-Sahalia 



(2008) for the case of multidimensional SDEs, since it results in an accurate closed-form 



approximation for px (Jensen and Poulsen (2002)) 



In this work SDEs with multiple random effects have been studied, moving a step 
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forward with respect to the results presented in Picchini et al. (2010), where Gaussian 
quadrature was used to solve the integrals for a single random effect. The latter approach 
results unfeasible when there are several random effects because the dimension of the 
integral grows. In fact, it may be difficult to numerically evaluate the integral in ([s]) and 
([7]) when b* G -B C M^, with q much larger than 2, and efficient numerical algorithms are 
needed. As noted by Booth et al. (2001), if e.g. g = 20 one cannot count on standard 



statistical software to maximize the likelihood, and numerical integration quadrature is only 
an option if the dimension of the integral is low, whereas it quickly becomes unreliable when 



the dimension grows. Some references are the review paper by Cools (2002), Krommer and 



Ueberhuber (|1998) and references therein, or one of the several monographs on Monte Carlo 



methods (e.g. Ripley (2006)). In the mixed-effects framework the amount of literature 



devoted to the evaluation of g-dimensional integrals is large, see e.g. Davidian and Giltinan 



(2003), Pinheiro and Bates (1995), McCuUoch and Searle (2001) and Pinheiro and Chao 



(2006). We decided to use the Laplace approximation because using a symbolic calculus 
software it is relatively easy to obtain the Hessian matrix necessary for the calculations, 
which results useful also to speed up the optimization algorithm. 

Computing derivatives of long expressions can be a tedious and error prone task even 
with the help of a software for symbolic calculus. In those cases we reverted to software 
for automatic differentiation (AD, e.g. Griewank (2000)). Although the present work 



does not necessarily rely on AD tools, it is worthwhile to spend a few words to describe 
roughly what AD is, since it is relatively unknown in the statistical community even if 



it has already been applied in the mixed-effects field (Skaug (2002); Skaug and Fournier 



(2006)). AD should not be confused with symbolic calculus since it does not produce 
analytic expressions for the derivatives/Hessians of a given function, i.e. it does not produce 
expressions meant to be understood by the human eye. Instead, given a program computing 
some function h{u), the application of AD on h{u) produces another program implementing 
the calculations necessary to compute gradients, Hessians etc. of h{u) exactly (to machine 
precision); furthermore, AD can differentiate programs including e.g. for loops or if -else 
statements, which are outside the scope of symbolic differentiation. See http://www. 
[autodif f . or g for a list of AD software tools. However, the possibility of easily deriving 
gradients and Hessians using AD comes at a price. The code produced by AD to compute 
the derivatives of h{u) may result so long and complex that it might affect negatively 
the performance of the overall estimation procedure, when invoked into an optimization 
procedure. Thus, we suggest to use analytic expressions whenever possible. However, at the 
very least, an AD program can still be useful to check whether analytically obtained results 
are correct or not. Modellers and practitioners might consider the software AD Model 
Builder (ADMB Project (2009)), providing a framework integrating AD, model building 
and data fitting tools, which comes with its own module for mixed-effects modelling. 

This work has a number of limitations, mostly due to the difficulty in carrying out 
the closed-form approximation to the likelihood for multidimensional SDEs {d > 2). It is 
even more difficult when the diffusion is not reducible, although mathematical methods to 
treat this case are available (Ai't-Sahalia ( 2008[ )). Another limitation is that measurement 
error is not modelled, which is a problem if this noise source is not negligible relatively 
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to the system noise. The R PSM package is capable of modelhng measurement error 



and uses the Extended Kalman Filter (EKF) to estimate SDMEMs ( |Klim et al.| ( |2009[ )). 
EKF provides approximations for the individual likelihoods which are exact only for linear 
SDEs. The closed-form expansion considered in the present work can instead provide 
an approximation as good as desired to the individual likelihood ([s]) by increasing the 
order K of the expansion, though it can be a tedious task. Like in the present paper, 
PSM considers a Laplace approximation to multidimensional integrals, but Hessians are 
obtained using an approximation to the second order derivatives (first order conditional 
estimation, FOCE); in our work Hessians are obtained exactly (to machine precision) using 
automatic differentiation. Unfortunately, the structural differences between our method 
and PSM make a rigorous comparison between the two methods impossible, even simply in 
terms of computational times, since PSM requires the specification of a measurement error 
factor and thus both the observations and the number of parameters considered in the 
estimation are different. Finally, PSM assumes multivariate normally distributed effects 
only, whereas in our method this restriction is not necessary. 

We believe the class of SDMEMs to be useful in applications, especially in those 
areas where mixed-effects theory is used routinely, e.g. in biomedical and pharmacoki- 
netic/pharmacodynamic studies. From a theoretical point of view SDMEMs are necessary 
when analyzing repeated measurements data if both the variability between experiments 
to obtain more precise estimates of population characteristics, as well as stochasticity in 
the individual dynamics should be taken into account. 

AppendixA. REDUCIBILITY AND DENSITY EXPANSION COEFFICIENTS 

AppendixA.l. Reducible diffusions 

The following is a necessary and sufficient condition for the reducibility of a multivariate 



diffusion process (Ait-Sahalia (2008)): 



Proposition 1. The diffusion X is reducible if and only if 

13=1 q=l 

for each yi in E and triplet {i,j,k) = 1, ...,d. If cr is nonsingular, then the condition is 

where {(T^^}ij(x.) is the {i,j)-th element of cr^^ix). 

AppendixA. 2. General expressions for the density expansion coefficients 

Here are reported the explicit expressions for the coefficients of the log-density expan- 
sion ^ as given in Ait-Sahalia (2008). The use of a symbolic algebra program is advised 
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for the practical calculation of the coefficients. For two given (i-dimensional values y and 
yo of the process = 7(Xt) the coefficients of the log-density expansion are given by 



h=l 



C?{y\yo) = /"V?^Hyo + «(y-yo)Mu 

h=l -^0 

4'Hy|yo) = k[ GPiyo + uiy~yo)\yo)u'-'du. 
Jo 

for k > 1. The functions Gy^ are given by 

Gy (y |yo) = - 1^ -^^-L. +2 1. 1 + 1 a.e^) 

and for k >2 

^w, I , , , ^4'-'^(y|yo) , I ^ d'ct'\y\yo) 



4ee 



h=i h'=o \ / y y 

AppendixA.3. Coefficients of the orange trees growth SDMEM 

In model ([T2|)-(|T3|) is Y; = 7(X*) = 2^/xf/a so /iy(F/) = F/(0i+0i-cr2y/74)/(2(03 + 
03)(0i + 0^)) — 1/ (2F/), and for given values ?/*■ and y*„]^ of F/, we have 

Mo)f n ^ . <y\yf~y]-i) iyf -yi^i) 1 [ y] 



'32(03 + 0^(01 + 01) 4(03 + 0y 2 ^Vi/;-i 
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^ [y] + y) y]-i + y] y]^i + (y]y]^i) + y] y]^i + y]y]-i + 2/}- 

896(03 + + 

+ 0D(yf + yjyi-i + y]-i) + 3(yf + yfy]^^ + yj^i')) 
240(03 + 0^2(^^ + 0.) 

9(03 + 01)' + l/il/}-i(l/f + y]y]-i + y^i') 



24y!.i/!_i 



+ 



ki\2 



896(03 + 0^2(01 + 01)2 

(y^ {%yf + yj-i') + ^M-i + 10(03 + 0! 

240(03 + 0^)2(01 + 01) 



(2/}'|/}_i' + 9(03 + ct>\f) 



and 



exp 



X 



(t2a;. 



C^^\x:\x 



where C'('=)(x;.|x;._i) = C?^ ' ^ 



2v^ 



k = 0,1,2. 



AppendixA.4- Coefficients of the two-dimensional OU SDMEM 

(x(i)Vai,a;(2)7a2)^, so 



The process (15)-(16) is reducible and 7(x* 



cr-^x^ 



rfYj = (<T"^(/3 * b^)a - cr-i(/3 * b')<TYi)rft + ciW^ := ^^(77 - Y\)dt + dW* 



where rj = cr ^cx = {r]i, 772)^ and 



(1)* „,{2)iNT 



5' =1,2- If y;- = (yj ' ^y] 



and y*_i = (yj_i, l/,-_i) are two values from YJ, the coefficients of the order K = 2 density 



24 



for model (15)-(16) are given by: 
1 



expansion ^ 



-L. (2)i (2)ix2 

2^yj -y)-i) ' 



(2)i , ^ f2)i 



2\--ii -/J-/-"!! ' 



+^(42 - - m)4i + - ^2)42)') 
~\{yf - yf^{^ - viWii + 4i') + (yf-^i - ^2)(4i42 + 4i42)) 



24 



12"'21 
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+ (?/fJi-^2)(K2)' + (42)')) 



^ - ^f-i)'(-4(42)' + (4i)' - 2424i - 3(42) 



(2)^^ 



227' 



12 



21/ 



6 



+ -,0(^7' ^i-l)^(42 4l)(4l'^12 + 4l'^22) 



,(1)^2 



12 ^-"^ 



1 



*■ z/j-i)^(4i 42)(4i42 + 4i42) 
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1 

+(?/;:^;-r/2)Ki42 + 4i42)) 



,(2)j 



42)((?/r-i-^i)(4i' + 4i') 



+^(2/^^^ - - y^Di^2 - 4i)(42' + 42' - 4/ + 4i') 



AppendixA.5. 

For model ( 20 ) we have 



12 -J 

Coefficients of the square root SDMEM 



2^X1 
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and 

2q + 1 (3%' 



my: 



2Yi 2 ' 



where q = 2/3*(a + «*)/(cr*)^ — 1. For given values yj_i and of the coefficients of the 
order K = 2 density expansion are: 

+ 2^2^_/) + 12g2_3], 
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